Laboratorio 5.

Analisis de sentimientos

Cuenta de NA’s por columna:

##                   id                brand           categories 
##                    0                    0                    0 
##            dateAdded          dateUpdated                  ean 
##                    0                    0                31979 
##                 keys         manufacturer   manufacturerNumber 
##                    0                    0                  203 
##                 name         reviews.date    reviews.dateAdded 
##                    0                   67                    0 
##     reviews.dateSeen  reviews.didPurchase  reviews.doRecommend 
##                    0                38886                10615 
##           reviews.id   reviews.numHelpful       reviews.rating 
##                38886                38536                    0 
##   reviews.sourceURLs         reviews.text        reviews.title 
##                    0                   34                  475 
##     reviews.userCity reviews.userProvince     reviews.username 
##                65634                70595                   95 
##                  upc 
##                    2

Se eliminarán las columnas:

Convertir el texto a mayúsculas o a minúsculas

Quitar los caracteres especiales que aparecen como “#”,”@” o los apóstrofes.

Quitar las url

Revisar si hay emoticones y quitarlos(a menos que le den información)

Quitar los signos de puntuación

Quitar los artículos, preposiciones y conjunciones (stopwords)

Quitar números si considera que interferirán en las predicciones.

##  [1] "brand"               "categories"          "manufacturer"       
##  [4] "manufacturerNumber"  "name"                "reviews.doRecommend"
##  [7] "reviews.rating"      "reviews.text"        "reviews.title"      
## [10] "reviews.username"    "upc"
## [1] "just awesome" "good"         "good"         "disappointed" "irritation"  
## [6] "not worth it"
## [1] " love  album   s  good     hip hop side   current pop sound    hype   listen   everyday   gym   give  star rating   way   metaphors  just crazy "                                                                                                                                                                                                                                                                                                                                                                       
## [2] "good flavor   review  collected  part   promotion "                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
## [3] "good flavor "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
## [4] " read   reviews    looking   buying one   couples lubricants    ultimately disappointed   didn t even live    reviews   read   starters  neither  boyfriend    notice  sort  enhanced   captivating  sensation     notice  however    messy consistency   reminiscent    liquid y vaseline    difficult  clean       pleasant  especially since  lacked   captivating  sensation     expecting   m disappointed   paid  much      lube   won t use      just use  normal personal lubricant    less money    less mess "
## [5] " husband bought  gel  us   gel caused irritation   felt like   burning  skin   wouldn t recommend  gel "                                                                                                                                                                                                                                                                                                                                                                                                                
## [6] " boyfriend   bought   spice things    bedroom     highly disappointed   product   bought  one   absolutely love  ky   mine   thought     similar affect    absolutely nothing    buy "

Análisis Exploratorio

Top 20 frecuencia de palabras:

## dfunlist
##               great   product     movie    review      part promotion collected 
##   1671020     21142     20464     20002     18929     18671     17734     17726 
##      love         t       use      good      like         s      skin       one 
##     17010     16425     16080     12328     11409     11262     10737     10335 
##      hair    really      just      will 
##      8999      8620      8493      8202

Nube de frecuencia de palabras :

#HIstograma

#Discusion sobre las palabras de mayor presencia:

Se notó que las palabras que mas se repiten en los reviews hacen referencia al producto y a que hay una tendencia a que sea bueno el producto. Se nota tambien cómo han servido las promociones que han hecho, ya que hablan mucho de ello.

#Determinación de las palabras positivas y negativas

## Joining, by = "word"
bing_word_counts <- prince_tidy %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  ungroup()
## Joining, by = "word"
bing_word_counts
## # A tibble: 2,932 × 3
##    word      sentiment     n
##    <chr>     <chr>     <int>
##  1 love      positive  17010
##  2 clean     positive   7910
##  3 easy      positive   5824
##  4 smell     negative   5279
##  5 recommend positive   4296
##  6 loved     positive   3990
##  7 soft      positive   3588
##  8 free      positive   3349
##  9 nice      positive   3275
## 10 funny     negative   2762
## # … with 2,922 more rows